33 research outputs found

    Automatic Human Joint Detection Using Microsoft Kinect

    Get PDF
    Automatic human joint detection has been used in many application nowadays. In this paper, we propose an approach to detect full body human joint method using depth and color image. The proposed solution is divided into 3 stage, which is image preprocess stage, distance transform stage, and anthropometric constraint analysis stage. The output of our solution is a stickman model with the same pose as in the given input image. Our implementation is done by using a Microsoft Kinect RGB and depth camera with 480x640 image resolution. The performance of this solution is demonstrated on several human posture

    IDEnet : Inception-Based Deep Convolutional Neural Network for Crowd Counting Estimation

    Get PDF
    In crowd counting task, our goals are to estimate density map and count of people from the given crowd image. From our analysis, there are two major problems that need to be solved in the crowd counting task, which are scale invariant problem and inhomogeneous density problem. Many methods have been developed to tackle these problems by designing a dense aware model, scale adaptive model, etc. Our approach is derived from scale invariant problem and inhomogeneous density problem and we propose a dense aware inception based neural network in order to tackle both problems. We introduce our novel inception based crowd counting model called Inception Dense Estimator network (IDEnet). Our IDEnet is divided into 2 modules, which are Inception Dense Block (IDB) and Dense Evaluator Unit (DEU). Some variations of IDEnet are evaluated and analysed in order to find out the best model. We evaluate our best model on UCF50 and ShanghaiTech dataset. Our IDEnet outperforms the current state-of-the-art method in ShanghaiTech part B dataset. We conclude our work with 6 key conclusions based on our experiments and error analysis

    CountNet: End to End Deep Learning for Crowd Counting

    Get PDF
    We approach crowd counting problem as a complex end to end deep learning process that needs both a correct recognition and counting. This paper redefines the crowd counting process to be a counting process, rather than just a recognition process as previously defined. Xception Network is used in the CountNet and layered again with fully connected layers. The Xception Network pre-trained parameter is used as transfer learning to be trained again with the fully connected layers. CountNet then achieved a better crowd counting performance by training it with augmented dataset that robust to scale and slice variations

    Automatic Human Joint Detection Using Microsoft Kinect

    Full text link
    Automatic human joint detection has been used in many application nowadays. In this paper, we propose an approach to detect full body human joint method using depth and color image. The proposed solution is divided into 3 stage, which is image preprocess stage, distance transform stage, and anthropometric constraint analysis stage. The output of our solution is a stickman model with the same pose as in the given input image. Our implementation is done by using a Microsoft Kinect RGB and depth camera with 480x640 image resolution. The performance of this solution is demonstrated on several human posture

    InstructTODS: Large Language Models for End-to-End Task-Oriented Dialogue Systems

    Full text link
    Large language models (LLMs) have been used for diverse tasks in natural language processing (NLP), yet remain under-explored for task-oriented dialogue systems (TODS), especially for end-to-end TODS. We present InstructTODS, a novel off-the-shelf framework for zero-shot end-to-end task-oriented dialogue systems that can adapt to diverse domains without fine-tuning. By leveraging LLMs, InstructTODS generates a proxy belief state that seamlessly translates user intentions into dynamic queries for efficient interaction with any KB. Our extensive experiments demonstrate that InstructTODS achieves comparable performance to fully fine-tuned TODS in guiding dialogues to successful completion without prior knowledge or task-specific data. Furthermore, a rigorous human evaluation of end-to-end TODS shows that InstructTODS produces dialogue responses that notably outperform both the gold responses and the state-of-the-art TODS in terms of helpfulness, informativeness, and humanness. Moreover, the effectiveness of LLMs in TODS is further supported by our comprehensive evaluations on TODS subtasks: dialogue state tracking, intent classification, and response generation. Code and implementations could be found here https://github.com/WillyHC22/InstructTODS

    Instruct-Align: Teaching Novel Languages with to LLMs through Alignment-based Cross-Lingual Instruction

    Full text link
    Instruction-tuned large language models (LLMs) have shown remarkable generalization capability over multiple tasks in multiple languages. Nevertheless, their generalization towards different languages varies especially to underrepresented languages or even to unseen languages. Prior works on adapting new languages to LLMs find that naively adapting new languages to instruction-tuned LLMs will result in catastrophic forgetting, which in turn causes the loss of multitasking ability in these LLMs. To tackle this, we propose the Instruct-Align a.k.a (IA)1^1 framework, which enables instruction-tuned LLMs to learn cross-lingual alignment between unseen and previously learned languages via alignment-based cross-lingual instruction-tuning. Our preliminary result on BLOOMZ-560M shows that (IA)1^1 is able to learn a new language effectively with only a limited amount of parallel data and at the same time prevent catastrophic forgetting by applying continual instruction-tuning through experience replay. Our work contributes to the progression of language adaptation methods for instruction-tuned LLMs and opens up the possibility of adapting underrepresented low-resource languages into existing instruction-tuned LLMs. Our code will be publicly released upon acceptance

    Survey of Social Bias in Vision-Language Models

    Full text link
    In recent years, the rapid advancement of machine learning (ML) models, particularly transformer-based pre-trained models, has revolutionized Natural Language Processing (NLP) and Computer Vision (CV) fields. However, researchers have discovered that these models can inadvertently capture and reinforce social biases present in their training datasets, leading to potential social harms, such as uneven resource allocation and unfair representation of specific social groups. Addressing these biases and ensuring fairness in artificial intelligence (AI) systems has become a critical concern in the ML community. The recent introduction of pre-trained vision-and-language (VL) models in the emerging multimodal field demands attention to the potential social biases present in these models as well. Although VL models are susceptible to social bias, there is a limited understanding compared to the extensive discussions on bias in NLP and CV. This survey aims to provide researchers with a high-level insight into the similarities and differences of social bias studies in pre-trained models across NLP, CV, and VL. By examining these perspectives, the survey aims to offer valuable guidelines on how to approach and mitigate social bias in both unimodal and multimodal settings. The findings and recommendations presented here can benefit the ML community, fostering the development of fairer and non-biased AI models in various applications and research endeavors
    corecore